Accurate personalized survival prediction for amyotrophic lateral sclerosis patients

Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease. Accurately predicting the survival time for ALS patients can help patients and clinicians to plan for future treatment and care. We describe the application of a machine-learned tool that incorporates clinical features and cortical thickness from brain magnetic resonance (MR) images to estimate the time until a composite respiratory failure event for ALS patients, and presents the prediction as individual survival distributions (ISDs). These ISDs provide the probability of survival (none of the respiratory failures) at multiple future time points, for each individual patient. Our learner considers several survival prediction models, and selects the best model to provide predictions. We evaluate our learned model using the mean absolute error margin (MAE-margin), a modified version of mean absolute error that handles data with censored outcomes. We show that our tool can provide helpful information for patients and clinicians in planning future treatment.


A.2 Cox-KP
The Cox proportional hazard model (Cox-PH) [1] is one of the most popular algorithms to model survival data.The Cox-PH model estimates patient-specific risk scores that rank the relative risk of dying for patients (e.g., patients with risk scores of 5 might die earlier than patients with risk scores of 4).In the Cox-PH model, the hazard ratio of any two patients is assumed to be constant over time.The hazard for a patient at time t is the chance of failure at that time.If patient A's hazard is twice that of patient B in one month to three months, then the relationship is assumed to be the same for all the other times.This relationship is called the proportional hazard assumption; it can be expressed as: where h i (t) is the hazard at time t for patient i, and the h 0 (t) is the baseline hazard.The θ is the trainable parameters, and x i is the covariate vector for patient i.The hazard ratio is exp(θ • x i ) which is not related to time, so the hazard ratio is proportional throughout all times.The vanilla Cox-PH model produces a risk score for each patient.The Cox-KP is the Cox proportional hazard model with the Kalbfleisch-Prentice estimator for baseline hazard.This extended Cox model produces a survival curve for an individual patient.In our implementation, we tuned the L2 regularization constant by internal crossvalidation.

A.3 Multi-task Logistic Regression
The multi-task logistic regression (MTLR) [5] is a discrete-time individual survival model that consists of a series of logistic regression models.The multi-task logistic regression first discretizes the future time into multiple disjoint intervals (e.g., [0,100) days, [100,200) days).Then, the MTLR model builds a logistic regression model for each time interval to estimate the survival probability of that specific time interval.In order to formulate the dependency between different time intervals (the patient who already died can not come back to life), the MTLR model combines a series of logistic regression models from all the time intervals to compute the final prediction output.Given the description of a novel patient x, the resulting learned MTLR model predicts a survival probability distribution, P ( S > t | x ).The survival probability distribution gives the probability that patient x will live until at least time t for each t > 0. In our implementation, we tuned the L2 regularization constant by internal crossvalidation.

A.4 Random Survival Forest
A random survival forest (RSF) [3] is built based on the well-known random forest regression algorithm.The random survival forest is modified to handle the censored survival data and is capable of computing individual survival distribution.This approach basically learns M decision trees from the training data.Then, the training instances are partitioned into their leaf nodes.Once trained, a test instance is passed through all M trees and reaches M leaf nodes.The set of training instances associated with those leaf nodes will be collected and produce a single survival curve for each tree by using the Kaplan-Meier (KM) estimator.The M survival curves from all trees are averaged together to produce the final predicted survival curve.In our implementation, we consider the implementation the same as Haider et al. [2] and tuned the number of trees and the nodesize using internal cross-validation.

B.1 C-index
The C-index (concordance index) computes the proportion of correct risk ordering for all pairs of comparable instances.A pair of patients are correctly ordered if the patient with a higher predicted risk dies earlier than the other patients (i.e., if the model claims patient A has a higher risk than patient B, then this model gets a "point" if, in fact, A dies before B.).A pair of patients is "comparable" if we can determine who died first.In the presence of censoring, a pair of patients is comparable if either they are both uncensored (event times are not missing) or one of the patients is censored later than the observed event time of the uncensored patient.The C-index is the proportion of correct ordering, so it is a real value between 0 to 1, where 1 means all comparable pairs are predicted correctly.C-index only measures the discriminative ability of a survival model and considers only a fraction of all pairs of patients (i.e., only comparable pairs).For C-index, higher values indicate better models.

B.2 MAE-Margin
The mean absolute error margin (MAE-margin) [2] is a modified version of the mean absolute error that includes censored instances.The mean absolute error evaluates the difference between predicted and true survival time, where we use the median of the patient's survival curves as the predicted survival time.Ignoring censored instances can lead to bias estimation because earlier event time is less likely to be censored.MAE-margin includes censored instances by estimating the true event time (e i ) for each censored instance as the mean of the Kaplan-Meier survival curve (S KM ( t )), conditioned on living until the censoring time.The estimated true event time e i can be written as: where c i is the censoring time for patient i.
When calculating the MAE-margin, a re-weighting scheme is applied to each absolute error based on the censoring time to encode our confidence about the estimated event time.The weight is given as ω i = 1 − S KM (t i ) for censored instances.The longer the censoring time, the more confident we have about the estimated event time.The MAE-margin Equation, including both uncensored and censored data, can be written as: where δ i is the censored bit, and t i is the time for both censored and uncensored instances.t i is the true survival time if δ i = 1 and t i is the censoring time if δ i = 0. Lower MAE-margin values indicate better models.

D Training Variables
Supplementary Table 2,

Table 2 :
3, and 4shows the mean and standard deviation of variables that we use to train our model.These variables will be normalized before training our model.The statistics for some clinical variables.We did not show the entire list of the clinical variables that we use, but we listed the statistics for some variables that are more representative.

Table 3 :
The statistics for the image variables, which are the cortical thickness extracted from MR images.

Table 4 :
The statistics for the image variables, which are the cortical thickness extracted from MR images.Continued from Table3.